Assignment1

1.1 Word cloud

We create two world clouds about both pleased and displeased feedbacks given by customers for watches Casio AMW320R-1EV bought at www.amazon.com. The cloud with pleased feedbacks is shown as follow. Evidently, “watch” is the most popular word in the pleased feedbacks. The second is “one” and the third is “the”. Additionally, people will write some other words in their feedbacks as “price”, “time”, “battery”, “band”, “display”, “great”, “years”, etc.

It might represent that customers concern the most about price, time, battery, etc, and have good feedbacks about those aspects.

#1.1.1
data<-read.table("Five.txt",header=F, sep='\n')
data$doc_id=1:nrow(data)
colnames(data)[1]<-"text"
mycorpus <- Corpus(DataframeSource(data))
mycorpus <- tm_map(mycorpus, removePunctuation)
mycorpus <- tm_map(mycorpus, function(x) removeWords(x, stopwords("english")))
tdm <- TermDocumentMatrix(mycorpus) 
m <- as.matrix(tdm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)
pal <- brewer.pal(6,"Dark2")
pal <- pal[-(1:2)] 
wordcloud(d$word,d$freq, scale=c(8,.7),min.freq=3,max.words=60, random.order=F, rot.per=.15, colors=pal, vfont=c("sans serif","plain"))
title(main = "Word Cloud for Five.txt", font.main=1.5)

The next word cloud is created from the displeased feedbacks. The word “watch” is still the most popular. Therefore, “the”, “time” and “casio” are also very popular. Some other words are common, such as “battery”, “replace”, “back”, “amazon”, “just”, etc.

We see the “time”, “price” and “battery” here as well. So it contributes the confirmation of our assumption. Nevertheless, there might be negative or mixed feedbacks for such aspects, since we can find such words as “stopped”, “replacement”, “just”, “stop”, etc.

#1.1.2
data<-read.table("OneTwo.txt",header=F, sep='\n') 
data$doc_id=1:nrow(data)
colnames(data)[1]<-"text"
mycorpus <- Corpus(DataframeSource(data))
mycorpus <- tm_map(mycorpus, removePunctuation)
mycorpus <- tm_map(mycorpus, function(x) removeWords(x, stopwords("english")))
tdm <- TermDocumentMatrix(mycorpus) 
m <- as.matrix(tdm)
v <- sort(rowSums(m),decreasing=TRUE) 
d <- data.frame(word = names(v),freq=v) 
pal <- brewer.pal(6,"Dark2")
pal <- pal[-(1:2)] 
wordcloud(d$word,d$freq, scale=c(8,.7),min.freq=3,max.words=60, random.order=F, rot.per=.15, colors=pal, vfont=c("sans serif","plain"))
title(main = "Word Cloud for OneTwo.txt", font.main=1.5)

1.2 Phrase net

After that, We create phrase net with connector words “am, is, are, was, were” and 60 common words for both pleased and displeased feedbacks. Notice that “watch” and “I” are the most frequent words in both nets. The word “I” and “watch” connect a lot of positive adjectives in pleased feedbacks such as “awesome”, “unbeatable”, “durable”, “happy”, etc.

Phrase net

Phrase net

However, according to the negative-feedback phrase net, “I” and “watch” connect some negative adjectives such as “disappointed”, “sad”, “detective”, etc. In addition, we find “alarm” is the most popular word connected with “detective” besides “watch”.

Phrase net

Phrase net

1.3 Word tree

1.3.1 Pleased feedbacks

Afterwards, we create different world trees for all the feedbacks. The first tree is supported by pleased feedbacks. It is difficult to find some useful information from such word tree.

Word tree

Word tree

Then we create new word trees based on several key words such as “price”, “band”, “battery” from word cloud and phrase net. It is obvious that “great price” is the most frequent property that people mention. Many customers satisfy this good-looking watch with a cheap price compared with other watches.

Word tree

Word tree

Remarkably, some satisfied customers also mention that “battery” is not good.

Word tree

Word tree

1.3.2 Unpleased feedbacks

On the other hand, we create several world trees for displeased feedbacks. Although the sample size of negative feedbacks is not very large, it is evident that there are some possible issues of Casio AMW320R-1EV. The first tree is the original tree (key word “seems”), many customers mention that the analog does not work sometimes.

Word tree

Word tree

The following is the tree searching the sentences including “alarm” as the phrase net indicates. Several customers think that the alarm is unusable and defective. Additionally, these customers may also dislike the chronometer the watch has.

Word tree

Word tree

When we choose “display” as key word from phrase net, besides chronometer display, the digit display is also be problematic for some customers. Word tree

However, “great price” is frequently mentioned in the unpleased feedbacks as well. Word tree

Conclusion

  • The satisfied customers consider that the watch doesn’t stop, it is best looking, robust, awesome watch for money, comfortable band, casual and sport, analog watch and digital, etc. Most of customers are happy that the watch works with greater accuracy and water resistance. Thus, Casio AMW320R-1EV gives these customers a good impression.

  • The unsatisfied customers are angry with the poor luminosity (display) of Casio AMW320R-1EV. Such other problems as the cheap rubber band, striking analog, stopped work, getting stuck in alarm mode cause the customers not to buy that watch again.

  • Both satisfied and unsatisfied customers mention “great price” and “battery problem” for Casio AMW320R-1EV.

  • To be more detailed, although most people think Casio AMW320R-1EV is a watch with great price and desirable to buy, striking analog and uncomfortable designs in digit & chronometer display and alarm are problems based on these graphs. It is preferable for the designers and department of after-sales service in CASIO pay attention to the repair and replacement of such products.

Assignment2

2.1 scatter plot of eicosenoic versus linoleic

#crete shared data
olive_shared <- SharedData$new(olive)

#2.1
scatter_olive <- plot_ly(olive_shared, type = "scatter", x = ~eicosenoic, y = ~linoleic)
scatter_olive

There is a group of oils with low eicosenoic acid, whose values are 1, 2 and 3.

Apdendix

Please put the code button to see the code of this report.

library(tm)
library(wordcloud)
library(RColorBrewer)
library(plotly)
library(crosstalk)
library(tidyr)
library(GGally)
set.seed(15)

knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE, include=TRUE)
#1.1.1
data<-read.table("Five.txt",header=F, sep='\n')
data$doc_id=1:nrow(data)
colnames(data)[1]<-"text"
mycorpus <- Corpus(DataframeSource(data))
mycorpus <- tm_map(mycorpus, removePunctuation)
mycorpus <- tm_map(mycorpus, function(x) removeWords(x, stopwords("english")))
tdm <- TermDocumentMatrix(mycorpus) 
m <- as.matrix(tdm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)
pal <- brewer.pal(6,"Dark2")
pal <- pal[-(1:2)] 
wordcloud(d$word,d$freq, scale=c(8,.7),min.freq=3,max.words=60, random.order=F, rot.per=.15, colors=pal, vfont=c("sans serif","plain"))
title(main = "Word Cloud for Five.txt", font.main=1.5)
#1.1.2
data<-read.table("OneTwo.txt",header=F, sep='\n') 
data$doc_id=1:nrow(data)
colnames(data)[1]<-"text"
mycorpus <- Corpus(DataframeSource(data))
mycorpus <- tm_map(mycorpus, removePunctuation)
mycorpus <- tm_map(mycorpus, function(x) removeWords(x, stopwords("english")))
tdm <- TermDocumentMatrix(mycorpus) 
m <- as.matrix(tdm)
v <- sort(rowSums(m),decreasing=TRUE) 
d <- data.frame(word = names(v),freq=v) 
pal <- brewer.pal(6,"Dark2")
pal <- pal[-(1:2)] 
wordcloud(d$word,d$freq, scale=c(8,.7),min.freq=3,max.words=60, random.order=F, rot.per=.15, colors=pal, vfont=c("sans serif","plain"))
title(main = "Word Cloud for OneTwo.txt", font.main=1.5)
input_path <- "olive.csv"
olive <- read.csv(file = input_path)
olive$Region <- as.factor(olive$Region)
#crete shared data
olive_shared <- SharedData$new(olive)

#2.1
scatter_olive <- plot_ly(olive_shared, type = "scatter", x = ~eicosenoic, y = ~linoleic)
scatter_olive
#2.2
bar_olive <-plot_ly(olive_shared, x=~Region)%>%add_histogram()%>%layout(barmode="overlay")

bscols(widths=c(2, NA),filter_slider("R", "Stearic", olive_shared, ~stearic),
       subplot(scatter_olive,bar_olive)%>%
         highlight(on="plotly_select", dynamic=T, persistent = T, opacityDim = I(1))%>%hide_legend())

ggplot(olive, aes(y = linoleic, x = stearic))+
  geom_point()+
  geom_smooth(method = "loess")
#2.3
scatter_olive2 <- plot_ly(olive_shared, type = "scatter", x = ~arachidic, y = ~linolenic)%>%
  add_markers(color = I("lightblue"))
subplot(scatter_olive,scatter_olive2)%>%
  highlight(on="plotly_select", dynamic=T, persistent=T, opacityDim = I(1))%>%hide_legend()
#2.4
p<-ggparcoord(olive, columns = c(4:11))

d<-plotly_data(ggplotly(p))%>%group_by(.ID)
d1<-SharedData$new(d, ~.ID, group="olive")
p1<-plot_ly(d1, x=~variable, y=~value)%>%
  add_lines(line=list(width=0.3))%>%
  add_markers(marker=list(size=0.3),
              text=~.ID, hoverinfo="text")

p2<-plot_ly(d1, x=~factor(Region) )%>%add_histogram()%>%layout(barmode="overlay")


ButtonsX=list()
for (i in 2:11){
  ButtonsX[[i-1]]= list(method = "restyle",
                        args = list( "x", list(olive[[i]])),
                        label = colnames(olive)[i])
}
ButtonsY=list()
for (i in 2:11){
  ButtonsY[[i-1]]= list(method = "restyle",
                        args = list( "y", list(olive[[i]])),
                        label = colnames(olive)[i])
}
ButtonsZ=list()
for (i in 2:11){
  ButtonsZ[[i-1]]= list(method = "restyle",
                        args = list( "z", list(olive[[i]])),
                        label = colnames(olive)[i])
}

olive2=olive[, 2:11]
olive2$.ID=1:nrow(olive)
d2<-SharedData$new(olive2, ~.ID, group="olive")

p3 <- plot_ly(d2, x = ~eicosenoic, y = ~linoleic, z= ~oleic, alpha = 0.8) %>%
  add_markers() %>%
  layout(scene = list(
    xaxis=list(title=""), 
    yaxis=list(title=""),
    zaxis=list(title="")
    ),
    updatemenus = list(
           list(y=0.9, buttons = ButtonsX),
           list(y=0.7, buttons = ButtonsY),
           list(y=0.5, buttons = ButtonsZ)
         ) 
  )

#show
ps<-htmltools::tagList(p1%>%
                         highlight(on="plotly_select", dynamic=T, persistent = T, opacityDim = I(1))%>%
                         hide_legend(),
                       p2%>%
                         highlight(on="plotly_select", dynamic=T, persistent = T, opacityDim = I(1))%>%
                         hide_legend(),
                       p3%>%
                         highlight(on="plotly_select", dynamic=T, persistent = T, opacityDim = I(1))%>%
                         hide_legend()
)
htmltools::browsable(ps)